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ABSTRACT 



The inveotion provides an automated system for replicating 
published web content and associated advertisements in the 
context of a hosting web site. At the hosting web site, the 
invention includes the process of brokering a client brows- 
er's request for a web page, analyzing the returned content 
and splitting it into component elements, extracting the 
desired component elements, recasting the desired elements 
in the look and feel of the hosting site and sending the recast 
content to the requesting client as a web page. Once the 
reformatted file is received at the client, the client browser 
interprets the HTML in the web page, presenting the content 
in the context of the hosting web site. On the content 
provider's web site, the details of the transaction in the web 
server logs are preserved, proxying a direct page view and 
ad impression. 

37 Claims, 11 Drawing Sheets 
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DISTRIBUTION MECHANISM FOR 
FILTERING, FORMATTING AND REUSE OF 
WEB BASED CONTENT 

BACKGROUND OF THE INVENTION 
The present ioventioo relates generally to the data pro- 
cessing systems. More particularly, it relates to managing 
and formatting electronically-published material distributed 
over a computer network. 

Hie World Wide Web is the Internet's multimedia infor- 
mation retrieval system. In the Web environment, client 
machines effect transactions to Web servers using the Hyper- 
text Transfer Protocol (HTTP), which is a known application 
protocol providing users access to files (e.g., text, graphics, 
images, sound, video, etc.) using a standard page description 
language known as Hypertext Markup Language (HTML). 
HTML provides basic document formatting and allows the 
developer to specify "links" to other servers and files. In the 
Internet paradigm, a network path to a server is identified by 
a so-called Uniform Resource Locator (URL) having a 
special syntax for defining a network connection. Use of an 
HTML-compatible browser (e.g., Netscape Navigator or 
Microsoft Interact Explorer) at a client machine involves 
specification of a link via the URL. In response, the client 
makes a request to the server (sometimes referred to as a 
"Web site") identified in the link and, in return, receives in 
return a document or other object formatted according to 
HTML. 

Among the many challenges in running a successful web 
site is the constant creation and updating the web pages and 
other files, i.e. web content, to keep the site fi^esh and new 
and attractive to web users. Web sites which do not update 
their content on a regular basis tend to lose their favor. 
Eventually, fewer "hits" arc logged on the web site's pages 
as fewer users view the information or advertisements which 
the web site is publishing. As web based advertising fees are 
typically based on the number of hits a page or site receives, 
this reduction will directly and adversely affect the revenues 
of the web site. Of course, the constant update of the web 
content, while necessary to maintain the popularity of the 
site, is very expensive in terms of manpower and time. 

Furthermore, much of the information on a particular web 
site is redundant when compared to information available on 
other similar sites. Some of this duplicate information rep- 
resents differences in opinion and is no doubt the sign of a 
tolerant and free society. However, much of the information 
is simply a duplication of the same news on each web site. 
From the perspective of the web site content provider, it 
would be efl5cient if some of the information found on other 
sites could be reused or "hosted" on his site. Thus, additional 
manpower for writing and entering articles on the web 
server can be reduced or eliminated. Of course, such reuse 
is subject to the copyright laws and must be the subject of 
an agreement with the content provider of the source male- 
rial. 

While Web-based content exists in abundance, it is not 
necessarily easy to persuade a web content provider to share 
content on a low or no charge basis. This is especially true 
for Web-based news articles, as these news articles typically 
represent the major revenue generating content for the 
publisher by carrying advertising banners above and/or 
below the article text. Therefore, the web publishers are apt 
to charge a large amount for licensing the content to other 
sites for reprinting. Each reprint represents a loss of revenue 
under the standard arrangement of exporting the content in 
raw format to the licensing host and that host posting the 
articles on their own site without the publisher's advertise- 
ments. 
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Further, even if a web site operator could find a content 
provider willing lo share their content at economically 
favorable terms, other problems exist. A single content 
provider may not be likely to provide the complete gamut of 

5 articles which the hosting web site would like to serve to its 
web clients. It would be preferable that the hosting site be 
able to use content from a variety of potential content 
providing web sites. Again, the likelihood of finding many 
willing quality web content providers is even lower. Yet 

10 even if this feat were accomplished, as each site has its own 
look and feel, if the content was presented in the formal as 
it origjually appeared on each of the web sites, the hosting 
site would present a disjointed hodgepodge collection of 
material. It is hardly the professional image that the hosting 

15 site should ideally project. 

It is unlikely that a web content provider who is essen- 
tially sharing his content for free will be willing to install 
special software or specially formal his information for the 
hosting site. If the material comes in raw format, considcr- 

20 able manpower must thus be devoted to making borrowed 
material on the hosting site look as though it was specifically 
created for the .site. This effort is naturally compounded 
where material comes firom a range of web content provid- 
ers. Further, there is likely to be some lag between the time 

25 that the web content is available on the content provider's 
web page and its appearance on the hosting site. This dilutes 
the desired appearance of the hosting site having the latest 
and greatest material. 

In reality, the hosting site is unlikely to find many partners 
without some convincing demonstration that its reuse of the 
material will somehow benefit the original content provider 
in some way, much less endanger his revenue stream. 
The present invention solves this important problem. 

35 SUMMARY OF THE INVENTION 

It is an object of the invention lo reduce the expense and 
effort of providing content in a new hosting web site. 

It is another object of the invention reuse content from 
40 other web sites with little to no licensing fees. 

It is another object of the invention to allow a content 
provider web site to maintain or expand a revenue base 
through ad impression. 

It is another object of the invention to reuse content from 
45 a variety of different content providers some of which may 
use radically different formats and other content. 

It is another object of the invention lo adapt content from 
other web sites to the appearance of the hosling web site so 
that the content from a plurality of web sites appears native 
to the hosling web site. 

It is another object of the invention to automatically 
update material on the hosting web site as it is changes on 
the content provider web sites. 

It is another object of the invention to reuse web content 
in a plurality of hosting site web pages each with a respec- 
tive appearance. 

It is another object of the invention to reuse web-based 
content without requiring a content provider web site to 
go modify content or install special purpose software. 

It is another object of this invention to enable a publisher 
of an electronic document lo control the reformatting of the 
document by a hosling site, . 

These objects and others are accomplished by managing 
65 copyrighted content on the Internet and World Wide Web by 
means of a filtering and formatting service located on a 
hosting server. The invention provides an automated system 
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for replicating published web oomem aad associated adver- (e-mail), one-to-many messaging (bulletin board), on-line 

lisements in the context of a hosting web site. At the hosting chat, file transfer and browsiog. Virions known Inieniet 

web site, the invention includes the process of brokering a protocols are used for these services. Thus, for example 

cUent browser's request for a web page, analyzing the browang b effected using the Hypertext Traiisfer Prot^l 

returned content ani splitting it into component element. 5 ^^^S rJ^^tlir^ alf^T^^ 

extracung the desired component elements recasung the ^^^=^^tr,« WnT^^ the wirtd Wide Web. 

desued elements m the look and feel of the hostmg site and „uuiedia infotmalion mtrieval 

sending the recast content to the requesting client as a web " 

paee. Once the reformatted file is received at the client, the system. , . . * 

client browser interpmts the HTML in the web page, pre- to 

seating the content i^ the context of the hosting web site. On for extracUng Web-based coiitent, especidly. but »ot ImiU^ 

the content provider's web site, the detaiUof the transaction ^eb-b^ ne>j«arUdes, tom content prov^^^^ 

in the web server logs are preserved, pn.xying a dinsct page Web sttes for use by the hostmg or "pass-through Web site 

view and ad impreslion. ^ ' Hese articles typically are revenue-generaUng content for 

. . ^-.k: the publisher by carrying advertising banners above and/or 

nie foregoing has outlined some of the more pertinent ^^^^^ ^^.^^^ ^^^^ Therefore, the pubUshers must benefit 

objecu and features of the present invention. These objects arrangement provided by the hosting site to be 

should be construed to be merely lUustralive of some of the .^^^^^^^^^ ^^^.^ ^^^^^^ ^ ^ 

more prominent features and apphcations of the mvention. i^j^^j ^ej conlenl provider maintains his ad 

Manyotherbenefiaalresultecanbeattamedby applymg h^ ^^^^^^^ ^ ^^^^^ advertisements are 

disclosed invention m a different manner or modifying the ^^-^^^^^^ 3 transparent manner. As the articles are also 

mvention as vnU be described Accordmgly, other objects ^ ^^^^^^^^ ^^^^^^ ^^^^^ 

and a fuller understanding of the mvenUon may be had by impressions are being solicited from two sites 

referring to the followmg. ^^^^^^ 

BRIEF DESCRIPTION OF THE DRAWINGS 25 During configuration, the pass through publisher 101 at 

the hosting site 103 is provided with the URLs 105 for the 

For a more complete understanding of the present inven- desired content provider web servers 107 and a set of filters 

tion and the advantages thereof, reference should be made to ^99 £qj. content publisher's document templates 111. For 

the following Detailed Description of the Preferred Embodi- ^ illustration, a single client 113 and a single web 

ment taken in connection with the accompanying drawings ^ content server 107 are depicted. However, the reader should 

in which; understand that a plurality of clients and web content servers 

FIG. 1 is a representative system in which the present are typically interconnected through the agency of the 

invention is implemented. hosting site. Upon a request 115 firom a client 113 for a given 

FIG, 2 is a simplified block diagram of a requesting client, web page, typically made through an HTTP request from the 

hosting seiver and plurality of content provider servers 35 resident browser, the process for providing a page usmg the 

which illustrates an overview of the process of the present pass through mechanism begins. Next, after having estab- 

invention ^shed that the requested page originates at the web content 

FIG. 3'is an illustrative example of an unchanged source «rver 107, the hosting site makes a r^^^^^^^ 

web page as it would normally be presented by a cUent ^ ^ 

browser as retrieved from the content provider web server. 40 a more up to date version of the page is ^^^^^^^^^ 

, c J u content provider than is cached locally, the page is remmed 

FIG. 4 is an illustrative example of the reformatted web technology, the web page is typically an 

page as presented at the client browser after having under- references to the component .wav, .mov, gif 

gone the processing of the present invention. jp^Q ^^-^^ together make up the web page as 

FIGS. 5A and SB are more deUiled flowcharts of a perceived by the user. Secondary page components such as 

preferred method of the processes which occur at the hosting cascading style sheets and Java applets can also be accom- 

server. modated by the invention. The list above is merely exem- 

HG. 6 is pictorial representation of a hosting filter defi- plary; any component on a web page can be extracted and 

nition interface. recast into the hosting site context by the present invention. 

FIG. 7 is a block diagram of the major components of the 50 Next, the pass through publi^er 101 retrieves the filler 

data processing system unit on which the invention may be definitions and policies from the filter database 109 for this 

practiced. particular content provider web site. Using the filters and the 

retrieved HTML page, the pass through pubUsher 101 parses 

DETAILED DESCRIPTION OF THE DRAWINGS source for desired components of the page. 

A representative system in which the present invention is 55 Typically, this is the title of the article, the ad banner or 

implemented is illustrated in FIG. 1. A plurality of Internet banners and the article text itself, although other items on the 

chent machines 10 are connectable to a computer network page are potentially desirable. These pieces of content are 

Internet Service Provider (ISP) 12 via a network such as a then recast into a new web page by means of an HTML 

dialup telephone network 14. As is well known, the dialup template 121 that matches the look and feel of the hosting 

telephone networic usually has a given, limited number of 60 Web site. The new page includes the graphics of the hosting 

connections 16a-16n, ISP 12 interfaces the client machines provider as weU as the navigational features of the hosting 

10 to the remainder of the network 18, which includes the site. This page is then sent 123 to the client 113 for 

hosting server 19 and a plurality of web content provider presentation by the browser. In a typical web interaction 

server machines 20. A client machine typically includes a between browser and server, once the browser receives the 

suite of known Internet tools, including a Web browser 13, 65 HTML page, it issues additional requests for the component 

to access the servers of the network and thus obtain certain files such as .gife, e.g., ad banners. For the ad banners 

services. These services include one-to-one messaging themselves, the new page preserves the call 125 back to the 
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content provider so that the correct advertising content is An alternative embodimcm to the invention is to provide 

presented. It is common that each request of a web page a client based Java applet that retrieves dynamic content 

from a server can be refreshed with a different advertise- from the web content provider's server directly from the end 

mcnt. user's browser. This allows the recast page to be loaded from 

In this way, the end user receives a pagp with graphic and 5 the hosting site's cache to the client browser and invoking 

navigation features from the hosting Web site that has an the Java applet for the retrieval of marked dynamic content, 

embedded article from the publisher and an advertisement This reduces the network bottleneck at the hosting site for 

served from the publisher's site. The final resuU is content dynamic HTML ad generation. 

viewed by the end user in host site's native Web context. Before describing the hosting process in greater detail, the 

with an ad banner served from the original publisher, thereby 10 reader's attention is directed to FIGS. 3 and 4 which 

preserving their revenue stream. respectively show the appearance of a content provider web 

It should also be noted that the article text is preferably p^gc as originally sent and the recast web page as sent from 

cached in a local cache 131, on the hosting Web server 103, hosting site. It should be understood that the page in FIG. 

for faster access and guaranteed access in the event that the 3 ^ ^^y^^ actually displayed by the client browser, however, 

publisher's Web site becomes inaccessible. The invention 15 showing the page as it would have been presented if the 

encompasses several variations in the types of information client had made the request directly to the content provider 

parsed from the page and cached locally. Some of this y^^yy gjjg ^ useful to understand the principles of the inven- 

information may be incorporated in the recast HTML page ^jQj, 

and some may be used for version checking. For example, ^ ^^^^^ ^ ^^^^ figures, the browser window 201 

information in the HTML header such as "last modified", 20 bounds each web page and contains standard graphical user 

"content length" and "content type" couU be kept with the interface elements such as title bars, menu items and scroll 

article text so that the copy in the cache can be compared to browser shown is Netscape Communicator, show- 

the version available at the content provider site. However, ^^^^ ^ standard client browser can be used unmodified to 

in the preferred embodiment, the applicants have found it to practice the invention. In the cUent area 203 showing the 

be more efficient to simply compare the "last modified" data 25 unmodified page, the logo banner 205, title area 207 and 

in the HTML header with the "last modified" data in the ^^^^^ ^^^^ 209 are shown. Under the logo banner 205, a set 

hosting system's cache file. Remember that the hosting site will retrieve other pages from the content 

103 makes the request 117 for the cUent to preserve the provider server. FinaUy, at the bottom of the page, an ad 

accounting data for the content provider web site 107. Smce ^^^^^ ^13 is presented. 

the header data is among the first ^ 1^^^^^^ ^^^^^^ ^« m FIG. 4, the recast page is shown in client area 303. In 

response, after a simple compare estabhshes ^h^t th« '^^^^^^^ ^^is example, the logo banner 305 is preserved, but moved 

version and the version ^^J^^^^^^^ to a new location (centered). The title area 307 and article 

provider web site are the same ^^«L!^^^ text 309 have changed location, font and font size and fine 

the content provider can be ended. The hostmg s^tem 103 ^ ^^^^^ ^1, 

then uses the cached copy of the article. In the event of no 35 ^ ^^^^^ Potent provider web pages have 

response from the content provider web site a cached copy ^ ^^^^^^ 

of the article is used. When then. « no cached copy of an ^ 1^ 

article, or the compare^tabhshes thata more recem version Vjoj ^ J.^.^^^ P^^ ^^^^^ 

of the article is available, the entire transmission ^^^^ P ^^^^.^^^ ^.^^ ^^^.^^^^ ^ ^ 

the content provider is received for processing 40 ^ ^ P ^^^^ ^^^^^^ ^ 

Alternatively rather than waitmg for a c^ent^i^est the ^^^^^ ^^^^^^ 

'freshness' of the cached ODO^nt ^^^^^^^ page. Nofe also that navigational features 315 and 317 native 

automatically generating HTTP requests from the cached PS ^ ^ 

URLs and monitoring data m the HTTP headers when the hosting web site a dis- 

page is hit in the background, updatmg the cache any tmie 45 ^^^^^^ ^^^^ ^^^^ hasdso been added. Of course, those 

the web content provider changes their data. skilled in the art wiU recognize that the examples of "desired 

-Die aim of cachmg pass-through web content is o ^ exemplary. Tlie example of the top ad, 

maximize efficiency by minimizmg network bandwidth article and bottom ad is common to many web news articles, 

requirements while preserving the transparency of the trans- invention aUows the hosting site to extract and recast 

action. By caching copies of the parsed content on the 50 ^^^^^ ^^^^^ ^^^^^^ elements from the 

hosting server, serving the content to the end user directly ^^^^^^ ^^^^^^^ 

and simulating their 'hit' on the publishers site in the ,u y f , .i,^ ^nt-.nt «rn«^Her 

background, the end user gets content directly from hosting Depending upon the policy for the web f^^^ei^t pro^^^^^^^ 

site >^thout having to wait for data to travel from the content variations in which elemerjts axe preserved m the recas Page 

webprovider^ssitetothehostingsite. However, thismethod 55 are possible. For example. 1^6°. ^^^^^ 

only assures a correct count for the web content provider feature. It may be n^moved or rt^uad m sia ^Pj^f^d 

whose advertising systems use a secondary HTTP request a different logo stored m the filter defimtioa ^ hn^^^^^^ 

for the image retrieval to generate the ad impression. For are opUonal; they could be 

systems that rely on dynamic HTML generation to log ad cated. As a techmcal mat er the ad banner 313 oP^^d, 

impressions, the ad content must be retrieved for each user 60 however, from a practical standpomt to obtain content at a 

and not cached on the host site. The static portion of the low hccnsmg fee, they are probably mandatory. Other items 

page, i.e. the article, however, can be cached, since it such as copyright noUccs are not shown m the figure, but 

remains the same for each visit at least for a relatively long co"!*^ ^ preserved. 

period of time. Serving the recast page to the end user will The process by which a new page is registered into the 

be delayed by the network for retrieving the ad content, but 65 hosting system is depicted in FIG. 5A. It begins in step 401, 

if the publisher's site becomes unavailable, the end user will when a new page or some other registration action is 

not be affected. detected. Step 403 determines whether the page is from an 
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existing acx»uni, i.e. an existing web content provider web definition was not a success, the pass through publisher can 

site. If not, a new account is started step 405. The account faU back to a scries of default filters which wiU assist in 

or folder Is a convenient place to store filter definitions, parsing the daU, step 459. The hosting site will still be able 

policies and any transaction information which pertains to a to present the reformatted content, however, the process will 

particular content provider. 5 be as efficient as through an existing filter definition. 

Jbc test in step 407 determines whether it is a new page. This "best guess** approach utiUzes several methods, includ- 

either because of a new URL or new version, which has ing looking for common references to advertismg engmes, 

started the registration process. If it is not a new page, step etc. As discussed below, the publisher can also look for a set 

409, determines whether it is a request to create or change of embedded Ugs indicating the desired content. Any docu- 

a filter definition which has started the registration process, jq ment that a filter can not be found for can be logged. 

For the purposes of this diagram, the policy for a content aUowing staff to later create appropriate filter definitions. In 

provider is considered part of the filter definitions ahhough practice, however, hosting sites employing the pass through 

the information can certainly be kept in a separate file. The technique will be able to define templates appropriate to all 

process will exit in step 411 if there is no filter definition to "rehosted" content. Most content provider sites employ a 

change. standard look and feel in their documents, allowing for 

In step 413. it is determined whether there is a suitable filters that are appropriate for large numbers of documents 

filter definition in the account folder for the content provider found on a particular web site, if not every document on the 

for the new page. As most pages in a web site share a entire provider web site. 

common formal and style, it is envisioned that a relatively These excerpted components are then run through the 

small set of filter definitions can be used for all of the pages pass-through publisher's "post-processing" system to assure 

from a particular site. If there is no existing filter definition not contain "dangerous" formatting code frag- 

suitable, in step 415, a new filter definition is created for the ments that could adversely effect the hosting web site, step 

page. There is more discussion on the creation of filter 461. For example, when articles are extracted from within a 

definitions and policies below in connection with FIG. 6. TABLE structure, HTML TABLE fragments could be left in 

In step 417, the page, i.e. URL is associated with the 25 the filtered HTML that could desU-oy formatting on the 

appropriate filter definition and in step 419 the appropriate hosting web site. As another example, interactive or browser 

changes to the account, URL and filler definition files are dependent scripting code could be found in the filtered 

made. Optionally, the new page can be processed and cached HTML that may not make sense in the document's new 

as part of registration. Thus, in step 421, the filler definition context. The post filtering tasks should also include fixing 

is used by the pass through publisher to exttact the desired any relative URLs embedded in the original web page to 

portions of the page. In step 423, these portions of the page preserve their original function. Optionally, this can be 

are cached for retrieval in the event of a client request. The accomplished by pointing the URLs to the hosting site for 

process ends, step 425. handling. For example, many documents are split into 

In FIG. 5B, the process for parsing and reusing web several pages by the web publisher. The link to the next part 

content by the pass through publisher is shown. When a 35 of the article can be U"anslated to a hosting site link so that 

client requests a new document from the pass through the next part is automatically served in the hosting site's 

publisher at the hosting web site, the requesting web client context. The relative link could also be translated to an 

information is recorded, and a request is made by the hosting absolute link so that it will still lead to the content provider 

web site to the content provider's web server on behalf of the server even when selected in the recast page. As would be 

requesting web client. The HTTP request to the web content ^ readily understood by those skilled in the art, these post 

provider server is similar to that which the requesting client filtering tasks could easily be performed by one of the filters, 

could make to the content provider site direcUy, except with however, the applicants have found it to be convenient to 

the hosting site as the originator. This assures that the web separate the tasks thus simplifying the construction of the 

content server's log files record a visit by the requesting filter definitions. 

client which is essential for preserving the content provid- 45 The component HTML file, once extracted, separated, and 

er's revenue stream. post filtered is then reformatted into a new document in the 

As mentioned above, the hosting site preferably caches style and context of the hosting web site, step 463. This is 
content likely to be requested by a client to improve the done by another component of the pass through publisher, a 
speed and reliability of the hosting web site pages. In this web publishing application that creates a "dynamic publish- 
way, if the document has not changed since the pass through 50 ing template". The web publisher injects the excerpted 
publisher last polled the site, it is retrieved fix)m the local content, titles, copyright statements and logos as received 
cache after registering the "hit" on the remote server. This from the post filtering process. In step 465, the desired 
reduces Internet bandwidth requirements and improves per- components are cached, which may include components 
formance 00 both the hosting web server and the web usefiil in detennining the version of a web page, but are not 
content provider server. 55 used in the recast page. In step 467, the recast page is sent 

However, for the process depicted in FIG. 5B, new to the requesting client. The process ends, step 469. Once 

content has been retrieved from the web content provider presented by the requesting browser, the content of the 

web server, step 451. Once the document content has been hosting web site appears seamless to the user, although il 

retrieved from the host provider, the filter database is may originate at a plurality of web content provider sites as 

searched for the appropriate filter definition, step 453, the 60 well as the hosting site itself. 

filter definition kept for the web content provider. The Since the code from the original content has been 

information in the filter definition will help the pass through abstracted and separated from its style and formatting, il is 

publisher parse the document structure of the web page, now possible to format before sending it to the user in any 

extracting the desired information. In step 457, a test is of a variety of styles. This can prove useful in a variety of 

performed to determine whether the parsing was a success. 65 situations. It is common for the web sites of several smaller 

If a filter definition for the page or web content provider organizations to be "hosted** by an organization with the 

is not found, or the first attempt using the associated filter technical expertise and capital equipment allowing the 
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smaller organizaiions to concentrate on creating the content and the content itself. The tiUe area inchides the tide of the 

for the web sites rather than the details of maintenance of the web page and is typically marked by HTML tag?. The 

server machines. A single pass through publisher could primary and secondary advertisements usually occur at the 

provide a diflfereat look and feel for each of the different top and bottom of the web page, but may be located at 
organizations hosted on its web serveis. Alternatively, a 5 different tocations. They are typicaUy marked in the HTML 

single hosting web site could provide several different by lags or comments indicating an advertisement. Depend- 

alternalive formats. The choice of which formal to present to ing on various factors, such as the desired look and feel for 

a particular user could be based on the organization or the hosting web site, the cross-publishing ag^-eement with 

location associated with the user. Alternatively, the web site the content provider, i.e. aUowing for republishing certain 
could allow the user to choose from among the different lo types ofweb content but not others and the filter, the content 

formats based on a registration of his preferences in a user may be very plain. A "bare bones" filter may strip out any 

profile. Thus, the look and feel of a web site can change extraneous links or "side bars'* of infonnation. Alternatively, 

dependent upon the requesting audience. the content may be a veAatim copy of a selected portion of 

Tlie invention provides a mechanism which allows a page, 
hosting web site to provide a wide variety and great amount 15 in addition to providing the system with information on 

of third party Web content without incurring high licensing separating the components of the document, filter definitions 

costs. Another benefit of the pass through system is in cost also include publisher specific information such as the logo 

savings. Unlike a traditional system of licensing and repub- or copyright statements and policies that should be used by 

fishing content, the hosting system docs not require a large the pass through publisher when formatting the new version 
production staff since the republishing and re-styling of the 20 of the document. 

content is automatic. A hosting system can provide a much Alternatively, the logo and copyright statements could be 

faster production cycle and assiuv that the content docs not excerpted components like the title, ads and content, 

quickly go "out of date". The fiUer definitions can also include the "policy" for a 

A discussion of filter definition creation follows. The particular web content provider. Any number of policies can 

collection of document filters help the pass through engine be established based on publisher, article, article section or 

understandlhestnictureof a wide variety ofweb documents. any other distinguishing criteria that can be identified. 

The document filters can be created through several Policies might govern whether content is licensed for use on 

methods, including the analysis of the HTML source code, an intranet, but not on the Internet, or vice versa, or both; 

imbedded comments or delimiters and through comparisons how many times a document may be served off a host site; 

with similar documents. Once the style of the web site is whether the publisher's ads should be passed through or not; 

understood, a filter can be developed to look for the portion what kind of caching strategy should be applied; what cost 

of the original document in which the hosting site is inter- each view of the article carries for the host site; and so on, 

ested in reformatting. Inconsistencies in document style or The specific types of policies available will depend on the 

structure can be neutralized by the use of custom code context in which pass-through is being used, whether as a 

imbedded in the web page and detailed in the filter defini- commercial product, integrated into custom solutions, or 

tion. bundled with other products. 

A CGI or other program can be used to create filter The client machine may be a personal computer such as 

definition files. FIG. 6 shows a user interface in which tags a desktop of notebook computer, e.g., an IBM or IBM- 
or text can be entered manually so that the pass through ^ compatible machine running under the OS/2® operating 

publisher can more easily parse a web content provider's system, an IBM ThinkPad® machine, or some other Intel 

web pages. In the browser window 501, client area 503 x86 or Pentium®-based computer nmning Windows *95 (or 

contains a plurality of controls for a set of desired compo- the like) operating system. Of course, the invention may be 

nents. Entry fields 505, 507, 509, 511, 513, 515, 517, 519 nm on a variety of computers or collection of computers 

and 521 are respectively used to enter the filter name, the under a number of different operating systems. The com- 

logo name, a copyright string, a beginning of the top banner puters on which the cUent software and the hosting and 

ad, the ending of top baimer ad, the beginning of the article content provider web site reside could be, for example, a 

text, the ending of the article text, the beginning of the personal computer, a mini computer, mainframe computer or 

bottom ad and the ending of the bottom ad. Note that certain a hand held computer. AlUiough the specific choice of 

items such as logo name and copyright string could be computer is Umited only by processor speed and disk storage 

replacements for those which occur in the web page, rather requirements, it is typical that the client computer will be 

than indicators of the desired content. somewhat "lighter weight" than the web server computers. 

A set of check boxes 523 allows the filter designer to For example, computers in the IBM PC series of computers 

indicate which of these items be wishes to keep on the recast could be used as clients in the present invention. One 
page. The table stripping check boxes 525 indicate whether 55 operating system which an IBM personal computer may run 

table formatting should be stripped from certain areas of the is IBM^s OS/2 Warp 4.0. For the web servers, the computer 

content provider's page. Custom filter code can be entered system might be in the IBM RISC System/6000 (TM) line 

in field 527. Field 529 allows the entry of custom code for of computers which run on Uie AIX (TM) operating system, 

filtering code behaviors outside the predefined filters. Spc- lo FIG. 7, a computer 710, comprising a system unit 711, 

cial cases can be accommodated by adding a function in a keyboard 712, a mouse 713 and a display 714 are depicted 

Perl, Java, JavaScript or a specialized filter scripting Ian- in block diagram form. The system unit 711 includes a 

guage. Push button 531 allows the user to change to a system bus or plurality of system buses 721 lo which various 

different filter definition. components are coupled and by which communication 

Each filter definition is stored in a filter definition data- between the various components is accomplished. The 
base accessible by the pass through pubhsher. The publisher 65 micropn)cessor 722 is connected to the system bus 721 and 

uses the filter definition to break the content into component is supported by read only memory (ROM) 723 and random 

parts: The title area, primary and secondary advertisements, access memory (RAM) 724 also connected to system bus 
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721. A microprocessor in the IBM PC series of computers is mean one who requests or gets the file, and "server" is the 

one of the Inie! family of microprocessors including the 386, entity which downloads the file. Moreover, although the 

486 or Pentium microprocessors. However, other micropro- present invention is described in the context of the Hypertext 

cessors including, but not limited to, Motorola's family of Maricup Language (HTML), those of ordinary skiU in the an 
microprocessors such as the 68000, 68020 or the 68030 5 will appreciate that the invention is applicable to alternative 

microprocessors and various Reduced Instruction Set Com- markup languages including, without limitation, SGML 

puler (RISC) microprocessors such as the PowerPC chip (Standard Generalized Markup Unguage), dynamic HTML 

manufactured by IBM might be used by the present inven- and XML (Extended Markup Language), 

tioo. Other RISC chips made by Hewlett Packard, Sun, Moreover, while the preferred embodiment is illustrated 
Motorola and others may be used in the specific computer. lO the context of a dialup network and the Internet, this is not 

The ROM 723 contains among other code the Basic a limitation of the present invention. The invention can also 

Input-Output system (BIOS) which controls basic hardware be implemented in an intranet environment where a large 

operations such as the interaction of the processor with the organization may have several content provider units which 

disk drives and the keyboard. The RAM 724 is the main provide content for content using units which target different 

memory into which the operating system and application customer segments and have different trade identities. Thus, 

programs are loaded. The memory management chip 725 is while the content using units may utilize much of the same 

connected to the system bus 721 and controls direct memory information, each will want to recast the information in a 

access operations including, passing data between the RAM different look and feel to project their own trade dress. 

724 and hard disk drive 726 and floppy disk drive 727. The There are many possible approaches to creating parsing 

CD ROM drive 732 also coupled to the system bus 721 is f^j. jjj^ invention. For predictable sets of documents 

used to store a large program or amount of data, e.g., a the approaches are straight forward. Where the documents 

multimedia program or presentation. vary a great deal some intelligence in either the pass-through 

Also connected to this system bus 721 are various I/O mechanism or the user who is configuring the filter is 

controllers: The keyboard controller 728, the mouse con- required. In other words, either a user needs to customize the 

troUer 729, the video controUer 730, and the audio controUer filtering at a level nearing programming or scripting skills to 

731. As might be expected, the keyboard controUer 728 account for aU the possible variations after study of a sample 

provides the hardware interface for the keyboard 712, the set of documents, or the pass-through mechanism needs to 

mouse controller 729 provides the hardware interface for be imbued with some level of fuzzy logic or artificial 

mouse 713, the video controller 730 is the hardware inter- intelligence. 

face for the display 714, and the audio controller 731 is the if an agreed on set of tags used by the web content 

hardware interface for the speakers 715. An I/O conU-oller provider and hosting sites, 100% of Web documents are 

740 such as a Token Ring Adapter enables communication parseablc. Thus, no intelligence is required from the pass 

over a network 746 to other similarly configiu-ed data through mechanism and no programming or scripting is 

processing systems. required of the user Special tags are embedded in the source 

One of the preferred implementations of the invention is of the targeted document(s) which identify the content areas. 

assetsofinstructions748-752residenlintherandomaccess This allows a ^default' filter to be used that requires no 

memory 724 of one or more computer systems configured customization beyond supplymg it with the target URL. 

generally as described above. Until required by the com- These special tags could uke the form of HTML comments, 

puter system, the set of instructions may be stored in another In the fiiture, the tags can be formalized as an XML 

computer readable memory, for example, io the hard disk Document TVpe Definition. It is envisioned that HTML 

drive 726, or in a removable memory such as an optical disk editing programs used by the content provider can add the 

for eventual use in the CD-ROM 732 or in a floppy disk for tags as the web content is created automaticaUy. 

eventual use in the floppy disk drive 727. Further, the set of The speed of document retrieval is an issue with the 

instructions can be stored in the memory of another com- invention, since in essence a single user's request for a 

puter and transmitted in a transmission means such as a local document is transformed into two separate requests, with all 

area network or a wide area network such as the Internet the potential for bottlenecks that any Web transaction has. 

when desired by the user. One skilled in the art knows that Caching can provide a partial solution, the title area, article 

storage or transmission of the computer program product body and other desired content can be cached locally on the 

changes the medium electrically, magnetically, or chemi- hosting site, so that it can be delivered to the user more 

cally so that the medium carries computer readable infor- quickly. Ad source needs to be retrieved fi-om the source site 

mation. on a per-user basis to preserve the ad accounting process of 

Further, the invention is often described in terms that many web sites. In addition, many ad systems serve ads 

could be associated with a human operator. While the based on the visitor's browser or other mformation. 
operations performed may be in response to user input, no 55 The invention can be configured a stand alone server 

action by a human operator is desirable in any of the software product. This would resemble a proxy server and 

operations described herein which form part of the present would serve two purposes: it would help the speed issue by 

invention; the operations are machine operations processing devoting more resources to the hosting activity, and it would 

electrical signals to generate other elearical signals. allow the servicing of several hosting web sites from a single 

As used herein, "Web client" should be broadly construed 60 server, 

to mean any computer or component thereof directly or The invention solves several business and technical prob- 

indirectly connected or connectable in any known or later- lems. It provides an attractive mechanism to obtain permis- 

developed manner to a computer network, such as the sion to reprint Web-based content with Uttle or no licensing 

Internet. The term "Web server" should also be broadly fees. Since the original publisher's transaction records are 
construed to mean a computer, computer platfonn, an 65 preserved, their existing revenue base is maintained through 

adjunct to a computer or platform, or any component the number of ad impressions counted. Since the ad impres- 

ihereof. Of course, a "client" should be broadly construed to sions are now also occurring on the hosting web site with 
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very little work on the part of the original publisher, the 
revenue is very likely to be increased. Thus, increased traffic 
is generated for both the hosting web site as well as the 
content provider's site with very little manual intervention 
after configuration. 

The invention is very flexible and is easily configured to 
accommodate a wide variety of web content. Through the 
use of document templates and standard filters, the invention 
allows simple modification of these elements to tailor them 
to any number of different content providers' formats and 
document templates. Once the hosting web server has been 
configured for a set of content providers, the production staflE 
necessary to republish articles is minimal. Content can be 
extracted without the content provider web site modifying 
content to a special format or installing special purpose 
software. Articles in the hosting web site are automatically 
synchronized with those in the content provider as changes 
are made at the content provider web site (so long as 
noncached material is used). By abstracting the content from 
any particular content provider site and reformatting the 
content to the hosting site's format a consistent look and feci 
is maintained. 

In one preferred embodiment of the invention, the hosting 
web server caches content locally to speed delivery to the 
requesting client and minimize dependency on the content 
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provider web site. lo other embodiments of the invention, 
unauthorized requests are blocked, eliminating a potential 
avenue for abuse of the system and copyright violation. 

3 In the atuched appendix, examples are given of a content 
provider's original web page, the template in which in 
hosting site inserts the excerpted desired oontent and the 
resulting recast page with comments. These examples will 
help the reader more fiilly understand the principles of the 

10 present invention. 

While the invention has been shown and described with 
reference to particular embodiments thereof, it will be 
understood by those skilled in the art that the invention can 
be practiced, with modification, in other environments. For 
example, although the invention described above can be 
conveniently implemented in a general purpose computer 
selectively reconfigured or activated by software, those 
skilled in the art would recognize that the invention could be 
carried out in hardware, in firmware or in any combination 
^° of software, firmware or hardware including a special pur- 
pose apparatus specifically designed to perform the 
described invention. Therefore, changes in form and detail 
may be made therein without departing &om the spirit and 
scope of the invention as set forth in the accompanying 
claims. 
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Original Content Provider HTML: 

IBM Global Services 

<http://www.ibm.oom/6ervices/article8/whHtwedo.html>What we can do for 

you<hitp://^vww.iT3m.coin/services/busincss/>Viewpoinia<http7/^^ 

vi(»s/caicer/>Careers<hUp:/Avww.ft3m.coin/seivice^usLness/fe^ 

Studies <http:/^vww.iT)m.conVscivicca/presarel/>Ncws<ht^-y/www 

ces/navtools/otherseiviccs.htinl><iiltp:/Avwwibm.com/Scarc3»>Search 

<http://www,ibin.oom/services/prof8ervices/iiidex.htnil>ProfiB8sional 

Services <http ://www.as.ibnLcom/>Product Support 

Seivice8<http://www.ibm.com/gIobalnetwork^Network 

Scrvic6s<http ://www.!bm.conT/scrvices/ourporlfolio.htiiil>Our Portfolb 

[BM Announces New e-business Services for Security 

Builds on popular packaged e-business services offerings 

March 24, 1998 

BOSTON, Massachusetts, March 24, 1998 ... IBM today announced new global 
security services that buiJd on the company's portfolio of e-busincss services 
introduced last October. IBM's e-business oflEierings help business use oecwodcs 
and Internet technologies to more securely buy and sell on the Web and improve 

internal and external communication. IBM made these announcemenls at Internet 
Commerce Expo. 

<../cbus/security.html>IBM Security Services help customers of all sizes 
assess 

and improve security in their oomputing environments. They address exposures 
aaoss operations, including policy and management systems, applications, 
networks, systems and physical site security. IBM has the unique capability as 
a security services provider to give customers a choice of individual 
ogerings or a con^relwnsive, end-to-end security solution. 

•IBM is a registered trademark of International Business Machines Corporation 
<http://www.ibm.oom/> IBM Homepage <http;//www.ibm.com/0«ierB/> Order 
<http://www.ibm.oom/A5sist/> Contact IBM 
<http://www. ibm.com/IB M/En^loyment> 

Employment <htlp://www.ibm.oom/Privacy/>Privacy <http://www.ibm.oomyLegal/> 
Legal 

The Hosting Site Web Page Template 
Home 

<http://dcv2.cross-5itc.com/apps/top,map> Need Help? Qick on the '?' 
<http://dev2x;ross-site.oom/apps/side.map> Need Help? Click on the '?' 
<http ://dc v2.cross-site.oom/cs/?scctio n=Ncws &tcxt=ncws/new5.html>Ncws | 
<http://C2.dejanews.oom/croBssite/>Forum5 | 

<http://dcv2.cross-site.oom/cs/?section'Cohjmns&tex^>co]umii5/oolunins.htmI>C 
olumns I 

<httpy/dev2.cros5-SLte.ooin/cs/?section»Resoiiice5&textBresources/resources. 
html>Resotuce3 | 
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-continued 
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<http://dcvZcross-site.coni/cs/?secdon«l>nviUoads&text-d(rvidoads/ckr^ 
btffil>[)ownloads | 

<hUp*yydev2xross-site.coni/c8/?secdon«C>cs8-Site&te]ct-about/about.htinI>Abo 
utj 

<hUpy/dcv2.cross-6itc.com/cs/?section-Products&tcxt-producls/products.htm 
l&sidebarsproducts/sidebar.htinl>Pro<lucts | 

<htlp*y/dcv2.cross-site.coin/cs/?Rection=Employnient&text«empIoymcDt/en^loy^ 
Ubtml>Einployment 

<hUp://dev2.cross-site.coin/cs/?sidebarBhoine/sidebaT.htinl>Hoine | 
<httpy/dcv2.cross-Bite.com/cs/?8ecUon-Scarch&tcxt-sitescarch/8carclLhtml«&t 
illc»Search&logo=logo.crosssit«>Scarch | 

<hup7/diev2.cross-siUi.coin/cs/7secboa-MaU&text-aiad/Diail.html>EmaU { 

<hupy/dev2.aoss-site.com/cV?8ecUoa«0>ntact&text-about/coatacthlml>Cont 

act 

I <httpV/dcv2.cross-site.conVc8/?8cctioD-Help&text^upport/help.htinl>HeIp 

(01998 Tlvoli Systems 

The Recast Web Page (including comment*): 

(The parsing engine extracted this oxk from the URL): 

<IMG SRC>"hltp://www.ibni.com/scrvioes/images/aounh.gif ' alt-^IBM Global 

Services" WIDTH-584 HElGHT-54 BORDER-0><br> 

^ABLE WIDrrH-584 CELLSPACING-0 CELLPADDINO-0 BORDER-0> 

<TR><TD><NOBR><A 

HREF="hltp://www.ibm.coni/serviccs/articlc5Avhatwedo.html'' 
TARGET-_lop><IMG SRC-'*htlp'7/www.ibm.com/scrvices/unagcs/foryou3.gif' 
AlX-"What we can do for yoif* WIDIB-US HEIGHT-18 BORDER-0>^A><A 
HREF--hllp ://www.ibni.com/8eivices/busmessr TARGEr-_top><IMG 
SRC-"http://\iww.ibm.oom/services/iinagesMcwpt3.gir ALT-^Viewpointe" 
WIDTH=fi3 HEtGHT"18 BORDER=0><;/AxA 
HREFo-http ://www.ibm.com/scrvices/carcerr TARGETo_TOP><IMG 
SRC-"http://\^wwibm.oom/se^v^oes/iJllages/ca^eeIs3.gif*AlT-**^ 
WIDTH=67 HE[GHT=18 BORDER-0></A><A 

HREF-*http ://www.ibm.com/6ervices/business/feature.html" TARGCT*_top><IMG 
SRC>**http://www.ibm.com/5crviccs/iinagcs/casestdy3.gif ' ALT-"Chse Studies" 
WIDTH»90 HE[GHT=18 BORDER=0><;/A><A 
HREF-**http://www.ibm.com/3ervices/pressreV"TARGET-_top><IMG 
SRO''bltp:/Aiww.ibm.cx)inAcrvices/ijnagcs/news3.gif • ALT="News" WIDTH-52 
HEIGHT-IS BORDER-0>^A><A 

HREF«"hltpy^w>)w.ibm.com/acrvicea/navtools/olhciBcrviccs.htmr*^ 
SRG»"*http://wwwibm.oom/services/iinages/countjysites.gif' WIDTHofi? 
HE[GHT-18 BORDER-0><;^A><A HREF-"hUp:/Mww.ibin.coin/Search" 
TARGET=^op ><IMG 

SRC>"hltp;//www.ibm.oonVservioes/images^arch3.gif' AlX-"Search" 

BORDER"0>-«/A>s/NGBR></TD></TR> 

</rABLE> 

(It then inserted the code into the hosting site's template, thusly:) 

<CENTER> 

<TABLE BORDER-0> 

<TR> 

<TD> 

<IMG SRC-"bttp://www.ibm.com/scrYioc5/iinagcs/aQtmh-gif' slt-^IBM Global 

Services" WIDTH=584 HEIGHT-54 BORDER=0><br> 

<TABLE WIDTH-584 CELLSPACING-0 CELLPADDING-0 BORDER-0> 

<TR><TD><NOBR><A 

HREF«""hUp;//www.ibm-com/s«rvices/articles/whatwedo.htnii" 
TARGET-_top><IMG SRC-"http://www.ibm.con]/scrviccs/imagcs/foryou3.gir' 
ALT»"What we can do for you" WIDTHol45 HEIGHT-IS BORDER=0></A><A 
HREF-*hltp;//www.ibm.ccm/services/businessrTARGEr-_top><IMG 
SRC-"http:/Aww.ibm.oom/scrvioe5/imagesMcwpt3.gir ALr="Vicwpoints" 
WIDTH-81 HEIGHT-18 BORDER-0>^A><A 
HREF-"http://www.ibm.com/5ervices/careerrTARGEr=_top><IMG 
SRO»"hUp:/A*njwAbm.coni/serviccs/iniflges/carccre3.gif* ALT=**Careers" 
WIDTH-67 HEIGHT-18 BORDER-0></A><A 

HREF-*hltp ://www.ibm.com/serviccs/busincss/feature. html" TARGET»_top><IMO 
SRC-"bttp:/Ain)wibn^oomytervices/unages/case$tdy3.gif • ALT-"C!ftSc Studies" 
WIDTH-90 HEIGHT-18 BORDER-0>^AxA 
HREF-" h itp ://www. ibm cOOT/services^rcssrel/" TARGEr=_top><IMG 
SRC-"hUp://www.ibm.oomy3ervioes/images/newa3.gif * ALT-^Ncws" WIDTH-52 
HEIGHT-18 BORDER-0></A><A 

HREF-''httpy/www.ibm.com/8ervices/navtools/othcreervices.htmr*><IMG 
SRO»"http:/Aww.ibm.cDm/scrviccs/imagcs/countrysite5.gif' WI0rH«87 
HEIGHT-18 BORDER-0></A><A HREF-"http://www.ibm.com/Scarch- 
TARGET-_top><IMG 

SRC^"hitp-yA(ww.ibm.ooin/services/images/scarch3.gif' ALr-"Scarch" 

BORDER-0>-^A>^OBR><m></rR> 

STABLE* 

</TD> 

</rR> 
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</TABLE> 
^CENTER> 

<A NAM&-"#rOP"><;rA> 

<FOr^ SIZE--+1" OOlJOR-**«XX)09y* FACE-"Arial, HclvBtica"> 

<B>News<B> 

</FONT> 

<!~ START TOP NAV BUTTONS -> 

<TABLE CELLPADDING-0 CEULSPAaNG-0 BORDER-0 WIDTH»100%> 
<TR AUGN-RIGfTT VAUGN»TOP> 

<TD BGCOLOR-FFCC33 ALIGN-RIGHT VALIGN-CENTER BORDER-0 

WIDTH=300% COLSPAN-2> 

<A HR EF»''http :y/dev2.cioss-fiite.coni/apps/iq3 .map"> 

<IMG NAME»"topbuuoiis" HEIGHT-35 WIDTH-175 

SRC»"http://dcv2.cross-site.coin/imagesAopbuttoQS.gir 

BORDER-0 AlT-"Need Help? Oick on the *?'" ISMAP ><JA> 

</rD> 

</TR> 

<!- END TOP NAV BUTTONS ~> 

(Similariy, the template has this insertion ^t Cor the article &om the 

content provider'a document:) 

^ABLE BORDER-0> 

<TR> 

<TD> 

</TD> 

</rR> 

</TABLE> 

(Into which the extracted article is inserted:) 
<H3> 

IBM Announces New e-business Services for Security 
<BR><SMALL>Builds on popular packaged e-business services 
offerings</SMALL> 

</m> 

<P><B>March 24, 1998<;/B></P> 

<:P>BOSTON, Massachusetts, March 24, 1998 ... IBM today announced new global 

security services that build on the company's portfolio of e-business 

services introduced last October. IBM's e-business offering? help business 

use networks and Internet technologies to more seciuely buy and sell on the 

Web and improve internal and external communication. IBM made these 

announcements at Internet Commerce Expo. 

<p> <a hrcf-„/cbu3/sccurily.html>IBM Security Scrviccs</a> help 

customers of all sizes assess and improve security in their computing 

environments. They address exposures across operations, including policy 

and management systems, applications, networks* systems and physical site 

security IBM has the unique capability as a security services provider to 

give customers a choice of individual offerings or a comprehensive, 

end-io^nd security solution. 

<BR> 
*;TONT> 

<rw> 
<m> 

</TABLE> 

(The end result is a unified HTML document with elements &om the 
publisher's page inserted into the host site's template to create a 
seamless whole.) 



We claim: 

1. A method for recasting web contenl on a hosting site, 
comprising the steps of: 

responsive to a request from a client browser for a recast 

web page from a hosting web server, generating a 

request by the hosting web server for an original web 

page from a content provider web server; 
parsing the original web page for a first set of desired 

content elements; 50 
inserting the first set of desired content elements into a 

web page template containing a hosting web server 

format, thus creating the recast web page; and 
serving the recast web page to the client browser; 
wherein the appearance of the recast page when presented 65 

by the client browser is as though all elements origi- 

naled at the hosting web server. 



2. The method as redted in claim 1, wherein one of the 
desired content elements is an advertisement element from 
the content provider web server, and the method further 
comprises the step of inserting a call back to the content 
provider web server for the advertising element 

3. The method as recited in claim 1 further comprising the 
steps of: 

caching the desired content from the original page at the 

hosting web server; 
responsive to a second request for the recast page from a 

client browser, determining whether there is a more 

recent version of the original page at the contenl 

provider server; and 
using the cached desired content if there is no more recent 

version of the original page to respond to the second 

request. 
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4. The method as recited in claim 3 further comprising the 

steps of: 

parsing the most recent version of the original web page 
for the first set of desired content elements; 

inserting the first set of desired content elements into a 
web page template containing a hosting web server 
format, thus creating a new version of the recast web 
page; 

serving the aew version of the recast web page to the 
client browser responsive to the second request; and 

caching the desired content elements from the more recent 
version of the original page at the hosting server. 

5. The method as recited in claim 3 wherein the cached 
desired content elements are cached in the form of the recast 
page. 

6. The method as recited in claim 1 further comprising the 
steps of: 

searching for a filter definition for web pages from the 
content provider server in a store of filter definitions; 
and 

responsive to identifying a filter definition for the content 
provider, using the identified filter definition to parse 
the original page. 

7. The method as recited in claim 1, further comprising 
the steps of: 

searching for a filter definition for web pages from the 
content provider server in a store of filter definitions; 
and 

re^onsive to a failure to find a filter definition for the 
content provider, using a default filter to parse the 
original page. 

8. The method as recited in claim 1, wherein at least some 
of the content in the recast page originates at a second 
content provider server and the method further comprises 
the steps of: 

responsive to a request from a client browser for a recast 
web page from a hosting web server, generating a 
request by the hosting web server for a second original 
web page from the second content provider web server; 

parsing the second original web page for a second set of 
desired content elements; and 

inserting the first and second set of desired content 
elements into a web page template containing a hosting 
web server format, thus creating" the recast web page; 

wherein when presented by a client browser, both the first 
and second set of desired content elements look native 
to the hosting server. 

9. The method as recited in claim 1, wherein the web page 
template contains navigational features for the hosting 50 
server. 

10. The method as recited in claim 1, further comprising 
the step of processing the desired content elements to 
ehminate harmful code, prior to insertion in the web page 
template. 

11. The method as recited in claim 1, further comprising 
the step of revising relative links in the original page to links 
appropriate to the recast web page. 

12. The method as recited in claim 1, further comprising 
the steps of: 

responsive to a second request from a client browser for 
a second recast web page from a hosting web server, 
generating a request by the hosting web server for a 
second original web page from a second content pro- 
vider web server, 

parsing the second original web page for a first set of 
desired content elements; 



inserting the first set of desired content elements into a 
web page template containing a hosting web server 
fonnat, thus creating the second recast web page; and 
serving the second recast web page to the client browser; 
wherein the appearance of both the first and the second 
recast pages when presented by the client browser is as 
though all elements originated at the hosting web 
server. 

13. The method as recited in claim 1, further comprising 
the steps of: 

determining client specific infomation about the client 

browser firom which the request originated; 
selecting among a set of web page templates for the 
hosting server based on the client specific information, 
wherein each of the web page templates contains a 
different re^ective format; and 
using the selected web page template for creating the 
recast web page. 

14. The method as recited in claim 13, wherein a plurality 
of web sites are serviced by the hosting web server and 
respective ones of the web page templates are used for each 
of the web sites. 

15. A system for recasting web content on a hosting site, 
25 comprising: 

means for generating a request by the hosting web server 
for an original web page from a content provider web 
server; 

means for parsing the original web page for a first set of 

desired content elements; 
means for inserting the first set of desired content ele- 
ments into a web page template containing a hosting 
web server format, thus creating a recast web page; and 
means for serving the recast web page to a client browser; 
wherein the appearance of the recast page when presented 
by the client browser is as though all elements origi- 
nated at the hosting web server. 

16. The system as recited in claim 15, wherein one of the 
desired content elements is an advertisement element from 
the content provider web server, and the system further 
comprises means for inserting a call back to the content 
provider web server for the advertising element. 

17. The system as recited in claim 15 further comprising: 
a cache for caching the desired content from the original 

page at the hosting web server; and 
means for determining whether there is a more recent 
version of the original page at the content provider 
server; 

wherein the cached desired content is used if there is no 
more recent version of the original page to respond to 
the second request. 

18. The system as recited in claim 17 further comprising: 
a store of URLs to content provider servers for cached 

content in the cache; and 
means for periodically polling the URLs to determine 
whether there is a more recent version of any of the 
cached content at any of the stored URLs. 

19. The system as recited in claim 17 further comprising: 
means for parsing the most recent version of the original 

web page for the first set of desired content elements; 
means for inserting the first set of desired content ele- 
ments into a web page template containing a hosting 
web server format, thus creating a new version of the 
recast web page; and 
means for serving the new version of the recast web page 
to the client browser responsive to the second request; 
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wherein the desired content elements from ihe more 
recent version of the original page are cached ai the 
hosting server. 

20. The system as recited in claim 15 further comprising: 

a store of filter definitions; ^ 
means for searching for a filler definition for web pages 

from the content provider server in the store of filter 

definitions. 

21. The system as recited in claim 20 wherein the store of 
filter definitions includes a default filter to be used when no ^ 
filter definition exists for the content provider server. 

22. The system as recited in claim 15, wherein at least 
some of the content in the recast page originates at a second 
content provider server and when the recast page is pre- 
sented by a client browser, both the first and second set of ^ 
desired content elements look native to the hosting server. 

23. The system as recited in claim 15, wherein the web 
page template contains navigational features for the hosting 
server. 

24. The system as recited in claim 15, further comprising 
means for processing the desired content elements to elimi- 
nate harmful code, prior to insertion in the web page 

template. 

25. The system as recited in claim 15, further comprising 
means for revising relative links in the original page to links 
appropriate to the recast web page. 

26. The system as recited in claim 15, ftirther comprising: 
means for determining client specific information about 

the client browser from which the request originated; 
and 

means for selecting among a set of web page templates for 
the hosting server based on the client specific 
information, wherein each of the web page templates 
contains a different respective format; 35 

wherein the selected web page template is used for 
creating the recast web page. 

27. The system as recited in claim 26, wherein a plurality 
of web sites are serviced by the hosting web server and 
respective ones of the web page templates are used for each 40 
of the web sites. 

28. A computer program product for recasting web con- 
tent on a hosting site, comprising: 

means for generating a request by the hasting web server 
for an original web page from a content provider web 
server, 

means for parsing the original web page for a first set of 

desired content elements; 
means for inserting the first set of desired content ele- 

ments into a web page template containing a hosting 

web server format, thus creating a recast web page; and 
means for serving the recast web page to a client browser; 
wherein the appearance of the recast page when presented 

by the client browser is as though all elements origi- 55 

nated at the hosting web server. 
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29. The product as recited in claim 28, wherein one of the 
desired content elements is an advertisement element from 
the content provider web server, and the system further 
comprises means for inserting a call back to the content 
provider web server for the advertising element. 

30. The product as recited in claim 28 further comprising: 
means for caching the desired content from the original 

page at the hosting web server; and 
means for determining whether there is a more recent 
version of the original page at the content provider 
server; 

wherein the cached desired content is used if there is no 
more recent version of the original page to respond to 
the second request. 

31. The product as recited in claim 28 further comprising: 
a store of URLs to content provider servers for cached 

content in the cache; and 
means for periodically polling the URLs to determine 
whether there is a more recent version of any of the 
cached content at any of the stored URLs. 

32. The product as recited in claim 28 further comprising: 
means for storing a set of filter definitions; and 
means for searching for a filter definition for web pages 

from the content provider server in the set of filter 
definitions. 

33. The product as recited in claim 28, wherein the web 
page template contains navigational feamres for the hosting 
server. 

34. The product as recited in claim 28, further comprising 
means for processing the desired content elements to elimi- 
nate harmful code, prior to insertion in the web page 
template. 

35. The product as recited in claim 28, further comprising 
means for revising relative links in the original page to links 
appropriate to the recast web page. 

36. The product as recited in claim 28, further comprising: 
means for determining chent specific information about 

the client browser from which the request originated; 
and 

means for selecting among a set of web page templates for 
the hosting server based on the client specific 
information, wherein each of the web page templates 
contains a different respective format; 

wherein the selected web page template is used for 
creating the recast web page. 

37. The product as recited in claim 36, further comprising 
means for servicing a plurality of web sites by the hosting 
web server and respective ones of the web page templates 
are used for each of the web sites. 

* « * * * 
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